Using Early Stopping to Reduce Overfitting in Wrapper-Based Feature Weighting
نویسندگان
چکیده
It is acknowledged that overfitting can occur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overfitting is related to the intensity of the search algorithm used during this process. We demonstrate that the problem of overfitting in feature weighting can be exacerbated if the feature weighting is fine grained. With greater representational power we risk learning not only the signal, but also the idiosyncrasies of the training data. In this paper we show that both of these effects can be ameliorated by the early-stopping strategy we present. Using this strategy feature weighting will outperform feature selection in most cases.
منابع مشابه
Overfitting in Wrapper-Based Feature Subset Selection: The Harder You Try the Worse it Gets
In Wrapper based feature selection, the more states that are visited during the search phase of the algorithm the greater the likelihood of finding a feature subset that has a high internal accuracy while generalizing poorly. When this occurs, we say that the algorithm has overfitted to the training data. We outline a set of experiments to show this and we introduce a modified genetic algorithm...
متن کاملUsing Early-Stopping to Avoid Overfitting in Wrapper-Based Feature Selection Employing Stochastic Search
It is acknowledged that overfitting can occur in feature selection using the wrapper method when there is a limited amount of training data available. It has also been shown that the severity of overfitting is related to the intensity of the search algorithm used during this process. In this paper we show that two stochastic search techniques (Simulated Annealing and Genetic Algorithms) that ca...
متن کاملImproving Incremental Wrapper-Based Subset Selection via Replacement and Early Stopping
This paper deals with the problem of feature subset selection in classification-oriented datasets with a (very) large number of attributes. In such datasets complex classical wrapper approaches become intractable due to the high number of wrapper evaluations to be carried out. One way to alleviate this problem is to use the so-called filter-wrapper approach or Incremental Wrapper-based Subset S...
متن کاملFast SFFS-Based Algorithm for Feature Selection in Biomedical Datasets
Biomedical datasets usually include a large number of features relative to the number of samples. However, some data dimensions may be less relevant or even irrelevant to the output class. Selection of an optimal subset of features is critical, not only to reduce the processing cost but also to improve the classification results. To this end, this paper presents a hybrid method of filter and wr...
متن کاملGenetic Algorithms for Feature Selection and Weighting
Automated techniques to optimise the retrieval of relevant cases in a CBR system are desirable as a way to reduce the expensive knowledge acquisition phase. This paper concentrates on feature selection methods that assist in indexing the case-base, and feature weighting methods that improve the similarity-based selection of relevant cases. Two main types of method are presented: filter methods ...
متن کامل